AITopics | boolean circuit

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, the constructions from approximation theory often have unrealistic aspects, for example, reliance on infinite precision to memorize target function values. To address this issue, we propose a formal definition of statistically meaningful approximation which requires the approximating network to exhibit good statistical learnability.

approximating turing machine, name change, statistically meaningful approximation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

Tractability in Structured Probability Spaces

Arthur Choi, Yujia Shen, Adnan Darwiche

Neural Information Processing SystemsNov-21-2025, 13:17:09 GMT

In particular, we introduce the notion of a hierarchical route distribution and show how they can be leveraged to construct tractable PSDDs over route distributions, allowing them to scale to larger maps.

artificial intelligence, machine learning, simple route, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
North America > United States > California > San Francisco County > San Francisco (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers

Xu, Kevin, Sato, Issei

arXiv.org Artificial IntelligenceOct-27-2025

Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks and to theoretically enhance expressivity by recursively increasing the number of computational steps. However, their comparative capabilities are still not well understood. In this paper, we provide a formal analysis of their respective strengths and limitations. We show that Looped Transformers can efficiently simulate parallel computations for deterministic tasks, which we formalize as evaluation over directed acyclic graphs. In contrast, CoT with stochastic decoding excels at approximate inference for compositional structures, namely self-reducible problems. These separations suggest the tasks for which depth-driven recursion is more suitable, thereby offering practical cues for choosing between reasoning paradigms.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.19245

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)

Add feedback

On efficiently computable functions, deep networks and sparse compositionality

Poggio, Tomaso

arXiv.org Artificial IntelligenceOct-15-2025

We show that \emph{efficient Turing computability} at any fixed input/output precision implies the existence of \emph{compositionally sparse} (bounded-fan-in, polynomial-size) DAG representations and of corresponding neural approximants achieving the target precision. Concretely: if $f:[0,1]^d\to\R^m$ is computable in time polynomial in the bit-depths, then for every pair of precisions $(n,m_{\mathrm{out}})$ there exists a bounded-fan-in Boolean circuit of size and depth $\poly(n+m_{\mathrm{out}})$ computing the discretized map; replacing each gate by a constant-size neural emulator yields a deep network of size/depth $\poly(n+m_{\mathrm{out}})$ that achieves accuracy $\varepsilon=2^{-m_{\mathrm{out}}}$. We also relate these constructions to compositional approximation rates \cite{MhaskarPoggio2016b,poggio_deep_shallow_2017,Poggio2017,Poggio2023HowDS} and to optimization viewed as hierarchical search over sparse structures.

artificial intelligence, machine learning, poly, (15 more...)

arXiv.org Artificial Intelligence

2510.11942

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

263c763d00c6126d37ba670a1fa10847-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:15:31 GMT

backdoor, language model, neural network, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Germany (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

The Implications of Local Correlation on Learning Some Deep Functions

Neural Information Processing SystemsOct-2-2025, 01:21:08 GMT

It is known that learning deep neural-networks is computationally hard in the worst-case. In fact, the proofs of such hardness results show that even weakly learning deep networks is hard. In other words, no efficient algorithm can find a predictor that is slightly better than a random guess. However, we observe that on natural distributions of images, small patches of the input image are correlated to the target label, which implies that on such natural data, efficient weak learning is trivial. While in the distribution-free setting, the celebrated boosting results show that weak learning implies strong learning, in the distribution-specific setting this is not necessarily the case. We introduce a property of distributions, denoted "local correlation", which requires that small patches of the input image and of intermediate layers of the target function are correlated to the target label. We empirically demonstrate that this property holds for the CIFAR and ImageNet data sets. The main technical results of the paper is proving that, for some classes of deep functions, weak learning implies efficient strong learning under the "local correlation" assumption.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.87)

Technology: